Growth Curves

Julin Maloof
# A tibble: 2,513 × 10
   plantID survey_date block   row col   pop      mf   rep height_cm herbivory
   <chr>   <date>      <chr> <dbl> <chr> <chr> <dbl> <dbl>     <dbl> <chr>    
 1 D13A    2023-01-27  D1        3 A     WL2       4    11       1.3 <NA>     
 2 D14B    2023-01-27  D1        4 B     CC        5    12       3.7 <NA>     
 3 D15A    2023-01-27  D1        5 A     FR        3     6       4.1 <NA>     
 4 D15B    2023-01-27  D1        5 B     BH        5    24       3.9 <NA>     
 5 D16A    2023-01-27  D1        6 A     IH        6    12       3.5 <NA>     
 6 D16B    2023-01-27  D1        6 B     CC        3    10       2.3 <NA>     
 7 D17A    2023-01-27  D1        7 A     WL2       1     9       1.7 <NA>     
 8 D17B    2023-01-27  D1        7 B     CP2       8     4       0.5 Y        
 9 D18A    2023-01-27  D1        8 A     LV3       7    12       1.2 <NA>     
10 D18B    2023-01-27  D1        8 B     BH        6     7       2.2 <NA>     
# ℹ 2,503 more rows

Objectives

Do the different populations grow at different rates?

  • How do we do this?
  • What are some challenges?

If growth were simple

\[ height = Intercept + slope * time + Error \\ height = Intercept + slope_{pop} * time + Error \]

What do we do with this?

Need a mathematical equation that produces these types of curves

Weibull to the rescue

There are actually a dozen or more equations that fit this type of data. Some are listed here

We will use the Weibull model

\[ y(t) = \alpha - \beta \cdot e^{-(k \cdot t)^\delta} \] This models height (y) at time \(t\) as a function of parameters \(\alpha\), \(\beta\), \(k\), and \(\delta\)

Lets play around to see what each of these does

Weibull function in R

weibull <- function (t, alpha, beta, k, delta) {
  result <- alpha - (alpha - beta) * exp(-(k * t)^delta)
  return(result)
}

growth <- tibble(time = seq(0,100,.1)) %>%
  mutate(model1 = weibull(t = time,
                          alpha = 60,
                          beta = 10,
                          k = .03,
                          delta = 3))

growth %>%  ggplot(aes(x=time, y=model1)) +
  geom_line()

weibull <- function (t, alpha, beta, k, delta) {
  result <- alpha - (alpha - beta) * exp(-(k * t)^delta)
  return(result)
}

growth <- tibble(time = seq(0,100,.1)) %>%
  mutate(model1 = weibull(t = time,
                          alpha = 60,
                          beta = 10,
                          k = .06,
                          delta = 3))

growth %>%  ggplot(aes(x=time, y=model1)) +
  geom_line()

Exercise 1

  1. Plot the results of this code
  2. adjust the parameters to determine how each one affects the curve. Only change one at a time
weibull <- function (t, alpha, beta, k, delta) 
{
  result <- alpha - (alpha - beta) * exp(-(k * t)^delta)
  return(result)
}

growth <- tibble(time = seq(0,50,.1)) %>%
  mutate(model1 = weibull(t = time,
                          alpha = 60,
                          beta = 10,
                          k = .02,
                          delta = 5))

Exercise 2

Use the cleaned data sent by Julin to generate this plot:

The next problem: Parameter Estimation

Parameter estimation

  • We need a statistical model that fits our data.
  • Parameters are the model coefficients that are tuned to achieve a fit between a statistical model and data.
  • For a linear regression the model parameters are the Intercept(s) and slope(s)
  • In the plot below we might want a model that describes the relationship between engine size and m.p.g.
  • I show two different models. Which fits the data better?
  • What is the statistical criteria for judging fit?

Judge fit by SSE

  • A classic way to judge goodness of fit is by the sum squared error (SSE):

\(\sum\limits_{i=1}^N(obs_i - predicted_i)^2\)

  • OK, but how do we find the parameters the minimize SSE?
  • For linear regressions this can be mathematically solved
  • For non-linear models, such as the Weibull growth model, it is essentially trial and error.
  • The computer picks a set of parameter values, calulates SSE, and then picks a new set.
  • If the new set is better than the old set, keep the new values
  • repeat until SSE is stable
  • There are very sophisticated algorithms to do this.

Let’s give it a try!

tm2 <- tdb %>% filter(pop=="TM2") %>%
  mutate(day=as.numeric(survey_date-min(survey_date)+30),
         mf=factor(mf))

m1 <- nls(height_cm ~ weibull(day, alpha, beta, k, delta),
          data=tm2,
          start = c(alpha=60, beta=1, k = .02, delta = 5), 
          lower = c(20, 0, 0, 2),
          upper = c(120, 20, .5, 10),
          trace = TRUE, 
          algorithm = "port")
  0:     110855.43:  60.0000  1.00000 0.0200000  5.00000
  1:     47332.913:  46.1813  0.00000 0.0188439  2.77590
  2:     28882.632:  39.3578  0.00000 0.0158662  2.00000
  3:     22084.278:  43.0125  0.00000 0.0128837  2.27667
  4:     16972.307:  45.0951  0.00000 0.0116162  3.33645
  5:     13636.271:  51.6978  0.00000 0.0102960  3.86786
  6:     10786.422:  56.7693  8.30227 0.00970057  5.85156
  7:     10295.173:  56.1451  7.24880 0.00999914  7.06968
  8:     10255.319:  56.9993  7.39982 0.00989561  7.26822
  9:     10253.209:  56.8254  7.48593 0.00991289  7.43640
 10:     10253.083:  56.8511  7.51245 0.00990859  7.45918
 11:     10253.076:  56.8438  7.51932 0.00990896  7.46931
 12:     10253.075:  56.8441  7.52125 0.00990880  7.47120
 13:     10253.075:  56.8438  7.52172 0.00990881  7.47181
 14:     10253.075:  56.8438  7.52185 0.00990880  7.47195
 15:     10253.075:  56.8438  7.52188 0.00990880  7.47198

look at the result and make predictions

m1
Nonlinear regression model
  model: height_cm ~ weibull(day, alpha, beta, k, delta)
   data: tm2
    alpha      beta         k     delta 
56.843802  7.521879  0.009909  7.471984 
 residual sum-of-squares: 20506

Algorithm "port", convergence message: relative convergence (4)
tm2 <- tm2 %>% mutate(pred=predict(m1))

plot it

tm2 %>%
  ggplot(aes(x=day)) +
  geom_line(aes(y=height_cm, group=plantID), alpha = 0.3) +
  geom_line(aes(y=pred), lwd=2, color="red") +
  geom_smooth(aes(y=height_cm), se=FALSE, lty=2)

BH

bh <- tdb %>% filter(pop=="BH") %>%
  mutate(day=as.numeric(survey_date-min(survey_date)+30),
         mf=factor(mf))

m1.bh <- nls(height_cm ~ weibull(day, alpha, beta, k, delta),
          data=bh,
          start = c(alpha=60, beta=1, k = .02, delta = 5), 
          lower = c(20, 0, 0, 2),
          upper = c(120, 20, .5, 10),
          trace = TRUE, 
          algorithm = "port")
  0:     606247.22:  60.0000  1.00000 0.0200000  5.00000
  1:     178910.23:  24.7998  0.00000 0.0181295  2.00000
  2:     145516.92:  26.3452  0.00000 0.0110788  2.00000
  3:     115626.63:  33.9470  0.00000 0.00912398  2.69431
  4:     99572.650:  39.0893  0.00000 0.00673122  3.67109
  5:     91050.227:  46.4137  6.88047 0.00839971  9.10416
  6:     69026.756:  48.2725  4.71698 0.00773596  10.0000
  7:     55409.357:  61.1308  4.63383 0.00703299  10.0000
  8:     54396.099:  61.9291  4.06246 0.00682983  6.66536
  9:     53873.108:  64.8017  3.67823 0.00669957  6.72795
 10:     53872.338:  64.8702  3.66063 0.00670252  6.71254
 11:     53872.338:  64.8710  3.66062 0.00670243  6.71231
 12:     53872.338:  64.8711  3.66059 0.00670241  6.71223
m1.bh
Nonlinear regression model
  model: height_cm ~ weibull(day, alpha, beta, k, delta)
   data: bh
    alpha      beta         k     delta 
64.871084  3.660590  0.006702  6.712235 
 residual sum-of-squares: 107745

Algorithm "port", convergence message: relative convergence (4)
bh <- bh %>% mutate(pred=predict(m1.bh))

plot it

bh %>%
  ggplot(aes(x=day)) +
  geom_line(aes(y=height_cm, group=plantID), alpha = 0.3) +
  geom_line(aes(y=pred), lwd=2, color="red") +
  geom_smooth(aes(y=height_cm), se=FALSE, lty=2)

dpr

dpr <- tdb %>% filter(pop=="DPR") %>%
  mutate(day=as.numeric(survey_date-min(survey_date)+30),
         mf=factor(mf))

m1.dpr <- nls(height_cm ~ weibull(day, alpha, beta, k, delta),
          data=dpr,
          start = c(alpha=60, beta=1, k = .02, delta = 5), 
          lower = c(20, 0, 0, 2),
          upper = c(120, 20, .5, 10),
          trace = TRUE, 
          algorithm = "port")
  0:     57294.785:  60.0000  1.00000 0.0200000  5.00000
  1:     3389.9232:  20.0889  0.00000 0.0187106  2.00000
  2:     2686.3353:  22.4904  0.00000 0.0110220  2.00000
  3:     2009.3312:  23.7972  4.01427 0.0108860  3.59587
  4:     1638.1676:  24.5869  5.22008 0.0111393  6.13990
  5:     1441.6690:  31.6295  5.51617 0.00960144  8.00897
  6:     1259.8223:  30.1951  5.06509 0.0103957  7.87282
  7:     1230.6140:  30.7348  5.39541 0.0104311  9.80934
  8:     1229.5089:  30.8887  5.33048 0.0104183  10.0000
  9:     1229.5088:  30.8892  5.32920 0.0104187  10.0000
 10:     1229.5088:  30.8893  5.32923 0.0104187  10.0000
 11:     1229.5088:  30.8893  5.32922 0.0104187  10.0000
m1.dpr
Nonlinear regression model
  model: height_cm ~ weibull(day, alpha, beta, k, delta)
   data: dpr
   alpha     beta        k    delta 
30.88934  5.32922  0.01042 10.00000 
 residual sum-of-squares: 2459

Algorithm "port", convergence message: relative convergence (4)
dpr <- dpr %>% mutate(pred=predict(m1.dpr))

plot it

dpr %>%
  ggplot(aes(x=day)) +
  geom_line(aes(y=height_cm, group=plantID), alpha = 0.3) +
  geom_line(aes(y=pred), lwd=2, color="red")  +
  geom_smooth(aes(y=height_cm), se=FALSE, lty=2)

problems

  • Need to compare between populations
  • Want p-values, statistical inference